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ABSTRACT 

This paper investigates the property of the penalized estimating equations when both 
the mean and association structures are modelled. To select variables for the mean and 
association structures sequentially, we propose a hierarchical penalized generalized esti- 
mating equations (HPGEE2) approach. The first set of penalized estimating equations 
is solved for the selection of significant mean parameters. Conditional on the selected 
mean model, the second set of penalized estimating equations is solved for the selection 
of significant association parameters. The hierarchical approach is designed to accommo- 
date possible model constraints relating the inclusion of covariates into the mean and the 
association models. This two-step penalization strategy enjoys a compelling advantage 
of easing computational burdens compared to solving the two sets of penalized equations 
simultaneously. HPGEE2 with a smoothly clipped absolute deviation (SCAD) penalty is 
shown to have the oracle property for the mean and association models. The asymptotic 
behavior of the penalized estimator under this hierarchical approach is established. An ef- 
ficient two-stage penalized weighted least square algorithm is developed to implement the 
proposed method. The empirical performance of the proposed HPGEE2 is demonstrated 
through Monte-Carlo studies and the analysis of a clinical data set. 

Key words: association; clustered binary data; generalized estimating equation; logistic 
regression; variable selection. 

1. Introduction 

Clustered binary data arise in many application areas of the biological and social sciences. 
For example, the disease status of members within a family is correlated due to the 
sharing of common genetic factors and environmental background. The objectives of 
statistical analysis often focus on modeling of the relationship between the response and 
explanatory variables and the association among response measurements within clusters. 
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First-order generalized estimating equations (GEE) have been developed for the analysis of 
longitudinal and other types of clustered data (Liang & Zeger, 1986). This approach does 
not require the specification of the joint distribution of the responses and is widely used to 
make inferences about the regression parameters on the marginal means. Recently, there 
has been increasing interest in the inferences on the association models. Second-order 
GEEs have been proposed to model the association between the responses. Prentice (1988) 
developed second-order GEEs with emphasis on the estimation of correlation parameters. 
Fitzmaurice & Laird (1993) proposed a model which parameterizes the association in 
terms of conditional odds ratios. Lipsitz, Laird & Harrington (1991), Liang, Zeger & 
Qaqish (1992), Carey, Zeger & Diggle (1993), Molenberghs & Lesaffre (1994), Lang & 
Agresti (1994), and Fitzmaurice & Lipsitz (1995) proposed models that parameterizes 
the association in terms of marginal odds ratio. Other discussion can be found in Yi & 
Cook (2002), Yi, He & Liang (2009, 2011) and He & Yi (2011), among others. 

When we model the mean and the association structures, we may face a large collection 
of covariates in which some of them are not important to feature the mean and the 
association structures of the responses. It is desirable to have a model selection approach 
which can select the significant covariates for both the mean and association models. In 
the literature, most variable selection procedures have been developed for the selection 
of covariates that influence the mean responses (e.g., Tibshirani, 1996, Fan & Li, 2002, 
2004; Qu & Li, 2006; Garcia, Ibrahim & Zhu, 2010; Wang, 2011). In a different direction, 
many other methods have been proposed to select covariance structures for multivariate 
Gaussian or binary data under the framework of graphical models (e.g., Meinshausen & 
Buhlmann, 2006; Yuan & Lin, 2007; Friedman, Hastie & Tibshirani, 2007). In Bondell, 
Krishna & Ghosh (2010) and Ibrahim, et al. (2011), fixed and random effects selection is 
considered in mixed effects models. So far, however, there has been little development on 
variable selections on mean and association models simultaneously. 
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When both the mean and association structures are modeled, we may be interested 
in model selection for mean model only, or for association model only, or for both mean 
and association models together. Suitable constraints may be imposed on the selection 
procedures to reflect individual analysis objectives. For example, in some applications, 
practitioners may add constraints that the exclusion of certain covariates from the mean 
model conceptually implies the exclusion of those covariates from the association model. 
This poses a technical question of how to properly incorporate such model constraints 
into selection and estimation procedures for the mean and association models. To address 
all of these concerns, we propose a hierarchical model selection strategy. In principle, we 
first select the covariates for the mean model, and then conditional on the selected mean 
model, we proceed to select variables for the association model. However, an important 
issue arises, as the selection procedure for the mean model involves estimation of the 
association parameters. To overcome this difficulty, we use the estimators obtained from 
the usual unpenalized GEE2 equations as the initial estimates. We first apply a one-step 
penalization on the mean parameters, then based on the penalized mean estimator, apply 
a one-step penalization on the association parameters. The penalized estimators obtained 
from the proposed HPGEE2 using a SCAD penalty is shown to have the oracle property. 
The asymptotic behavior of the penalized estimator under this hierarchical approach is 
investigated. An efficient two-stage penalized weighted least square algorithm is developed 
for the implementation of the proposed method. 

The rest of the article is organized as follows. Section 2 and 3 introduce the devel- 
opment of hierarchical penalized GEE2 method. Section 4 establishes the theoretical 
properties of the proposed method. Section 5 provides the details of the implementation 
of the algorithm. In Sections 6 and 7, empirical performance of the proposed penalized 
GEE2 is demonstrated through Monte-Carlo studies and the analysis of a clinical data 
set. 
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2. Notation and model formulation 

Suppose n clusters are randomly selected for the study. For cluster i = 1, ... ,71, let 

Yi = {Yn, ■ ■ ■ , Yi ni ) T be an «i x 1 binary response vector with mean E(Yi) = fa, and let 

4>ijk be the odds ratio between responses and Y ik (1 < j < k < rii) defined by 

PjYjj = 1, Y ik = l)P{Y l3 = 0, Y ik = 0) 
^ P(Y l3 = 0, Y lk = l)P{Y l3 = 1, Y lk = 0) • 

We consider a marginal model as follows: 1) g(faj) = xfj/3, where g(.) is a known link 

function, is a p x 1 vector of explanatory variables associated with Yij, and the /3 are 

regression coefficients to be estimated; 2) log^-fc = where ^-fe is a g x 1 vector of 

covariates which specifies the form of the association between Y^ and Y ik , and a is a 5 x 1 

vector of association parameters to be estimated. 

To ease computation, in contrast to the method of Prentice (1988), Carey, Zeger 

& Diggle (1993) proposed the alternating logistic regression method. The strategy is 

to estimate a using the ni Ci conditional events, Yy given Yq. and the covariates, for 

1 < j < k < rii with an appropriate offset: 

logitP(y ij = l\Y ik = y ik , Xij, z ijk ) = {zf jk a)y ik + log( _ ^ _ ^ jA _ ) , (1) 

1 faj fak y%jk 

where u ijk = E{Y ij Y ik \x ij ,Xi k , z ijk ) for 1 < j < k < n u and v { = [v ijk , 1 < j < k < riif . 
Let Ci denote the ni C*2- vector with elements 

Qjk = E{Yij\Y ik = y ik , x^, Zijk) = logit' 1 {{zf jk a)y ik + log(— — ^— — ^ )}, 

1 faj fa k ^ijk 

and let Ri be the vector of residuals with elements R%j k = Y^ — Qj k . Let Si denote the 
diagonal matrix with diagonal element Qj k (l — Cijk), and denote the matrix dQ/da T . We 
define A± = Fj — fa, E>i = cov(YJ|xj, Zj), and Cj = dfa/d(3 T , where Xi = (x^, 1 < j < rii) T 
and Zi = (zijk, 1 < j < k < rii) T . The two sets of equations for the mean and association 
parameters are given by 

n 
i=l 
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and 

n 

i=i 

The two equations are solved iteratively by alternating between two steps: 1) Given the 
association parameter a, estimate ft as a parameter in a marginal logistic regression using 
the first set of equations; 2) For a given (3, estimate the odds ratio parameter a using a 
logistic regression of Yj,j on each (k > j) with offset that involves faj and v^ k . Let 
Pa and a a denote the resultant estimators. Under regularity conditions, Pa and a. a are 
consistent estimators. 

3. Methodology 

3.1 Objectives and issues 

If we consider variable selection and estimation in GEE2 setting, there could be three 
different scenarios: a) We can select the mean model while the association parameters are 
assumed to be appropriate and therefore are consistently estimated; b) We are interested 
in the selection of association model while the mean parameters are assumed to be ap- 
propriate and therefore consistently estimated; c) We are interested in the simultaneous 
selection of mean and association parameters. Therefore, it is desirable to have a unified 
approach which can accommodate all of these scenarios. Furthermore, in the last scenario, 
we need to consider the relationship between the two sets of covariates Xij and Zij k , which 
can be classified as (1) {xij,x ik } H z ijk = $; (2) {x i:j ,x ik } = z ijk ; (3) {xij,x ik } n z ijk ^ $. 
Scenario (2) is actually a special case of scenario (3). In the latter two scenarios, the co- 
variates for the mean model are also considered as potential covariates for the association 
model. Then there may or may not exist constraints relating the selection of covariates. 
For example, some models may require that if Xjjj IS SI gnificant predictor with nonzero 
a x in the association model, it implies that $ij IS cl SI gnificant predictor with nonzero P x 
in the mean model. For instance, in the study of disease occurrence rates in a family, 
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if a genetic marker is not significant for the disease occurrence rates for each individual, 
scientists may speculate that this marker cannot influence the association between the 
disease occurrence among the family members. Therefore, for the underlying true model, 
it would be sensible to add such constraints as /3 x q = implying a X Q = 0. This would then 
create an additional issue for developing procedures for selecting mean and association 
models. To address all of these considerations, we propose a hierarchical model selection 
strategy. We first select the covariates for the mean model and then conditional on the 
selected mean model, we proceed to select the association model. However, some con- 
ceptual issues arise, as the selection procedure for the mean model involves estimation 
of the association parameters. But such difficulty can be solved if we use the estimators 
for the unpenalized GEE2 equations as the initial estimates and then apply a one-step 
penalization, first on the mean parameters, then based on the penalized mean estimator, 
apply a one-step penalization on the association parameters. 

3.2 Model selection and estimation 

We propose to perform the model selection for mean parameters by using: 

U P (P, a A ) - np' x (\/3\) © sign(/3) = 0, (2) 

where p' x (\(3\) = {P\{\Pi\), ■ ■ ■ ,p'\(\P P \)) T is a p-dimensional vector of penalty functions, 
sign(/3) = (sign(/3i), . . . , sign(/3 p )) T with sign(t) = I(t > 0) - I(t < 0), and J(.) is the 
indicator function. The notation denotes the component- wise product. Let (3 be the 
penalized estimator for the mean parameter, obtained by solving Equation (j2j). 

Next we perform the model selection for the association parameters by using the 
following function: 

U a {a.,f3) - np' x (\a\) 0sign(a) = 0, (3) 
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where = (p' A (|ai|), . . . ,p' x (\a q \)) T is a g-dimensional vector of penalty functions, 

and sign(a) = (sign(ai), . . . , sign(a g )) T . 

Remark 1: As we start with the consistent estimators /3a and a a, one-step penaliza- 
tion with proper penalty leads to consistent and sparse estimators (3 and a, with many 
of the elements estimated to be zero. Furthermore, we are able to show that such pe- 
nalized estimators enjoy the oracle property That is, with probability tending to one, 
the procedure selects the correct sub mean model and the correct sub association model. 
Our sequential two-stage approach allows the selection of covariates for the association 
to be conditional on the result from the selection of mean model. This property has an 
advantage to incorporate into selection procedures the constraints concerning the mean 
and association covariates. For example, if there is a model constraint that (3 x0 = 
leads to a x0 = 0, then if /3 X = 0, the procedure would automatically set a x = 0, so that 
the solution satisfies the constraint. Due to the stagewise nature, this method allows 
users to perform only model selection on either the mean parameters or the association 
parameters or both. If both steps utilize nonzero penalties, we jointly selects the mean 
and association models. If there are no constraints relating the mean and the association 
models, an alternative approach could be simultaneously solving the two sets of penalized 
estimating equations. This alternative procedure would involve alternating between the 
two penalized equations until convergence, which will be computationally more intensive 
than the proposed sequential two-stage method. 

Remark 2: In terms of the choice of penalty function, there are many penalty functions 
available. As the LASSO penalty, p\(\9i\) = X\9i\, increases linearly with the size of its 
argument, it leads to biases for the estimates of nonzero coefficients. To attenuate such 
estimation biases, Fan and Li (2001) proposed the SCAD penalty. The penalty function 
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satisfies p\(0) = 0, and its first-order derivative is 

p'M = \{I(0 < A) + (a ^~^ I(0 > A)}, for 9 > 0, 

where a is some constant, usually set to 3.7 (Fan and Li, 2001), and (t) + = tl{t > 
0) is the hinge loss function. The SCAD penalty is a quadratic spline function with 
knots at A and aX. It is singular at the origin which ensures the sparsity and continuity 
of the solution. The penalty function does not penalize as heavily as the Li penalty 
function on parameters with large values. It has been shown that the likelihood estimation 
with the SCAD penalty not only selects the correct set of significant covariates, but also 
produces parameter estimators as efficient as if we know the true underlying sub-model 
(Fan & Li, 2001). Namely, the estimators have the so-called oracle property. However, it 
has not been investigated if the oracle property is also enjoyed by hierarchical penalized 
estimating equations with the SCAD penalty. Furthermore, the asymptotic behavior of 
the hierarchical penalized GEE2 estimators needs to be investigated. In the next section, 
we will address these issues. 

4. Theory 

Let Po and ao denote the true value of /3 and a, respectively. Define Upp(j3,a) = 
dUf}(P 7 a)/<9/3 T , Up a (/3 } a) = dUp(/3, a)/da T , U aP (/3, a) = dU a (/3, a)/dfi T , and U aa (/3, a) = 
dU a ((3,a)/da T . Let Hpp((3,a) =Ep 0)Oi0 {-Upp(P,a)}, Hp a (f3,a) =Ep 0!ao {-U j3a (f3,a)}, 
H a p(JJ,a) = E/3 0>ao {U a p(f3,a)}, and H aa (/3,a) = Ep 0iao {-U aa ((3,a)} for the case with 
n — 1. We define the matrix of H(/3, a) as 

* Hpp{p,a), Hp Q (P,a) 
\y H a p((3,a), H aa ((3,a) J 

Denote the covariance matrix of the estimating equations for n = 1 as Vpp(j3, a) =Covp 0iOl0 
{Up((3,a)}, Vp a ((3,a) = V a p((3,a) T =Cov p 0iaa {Up((3, a), U a ((3,a)) T }, V aa (/3,a) =Covp 0:Ot0 



{U a (/3, a}. Furthermore, we define the matrix of V((3, a) as 

y V a p(/3,a), V aa (/3,a) J 
For notational convenience, let s denote the set of j such that (3jq ^ and s c denote the 
set of j such that /3jo = 0. Let v denote the set of j such that atjo 7^ and v c denote the set 
of j such that atjo = 0. To establish the asymptotic properties of the proposed penalized 
estimators, we assume certain regularity conditions which are listed in Appendix A. 

Theorem 1. Let G(P,&a) = Up((3,&A) — np' x (\/3\) sign(/3), and Gj(/3,aA)) be its jth 
element, j — 1, . . . , p. Given the SCAD penalty function p\(0), if X n — > 0, and \/n\ n — > oo 
as n — >■ oo, then there exist a solution j3 such that G(/3, a a) = 0, and \ \/3 — /3q\\ = O p {n^). 
Furthermore, we have 

lim P0j = 0) = 1, 

for all j such that (3jQ = 0, 

Theorem 1 establishes the existence of consistent estimator /3 to G(/3, a a) — 0. Fur- 
thermore, the estimator has the property of setting the nonsignificant mean parameter 
to zero with probability tending to one. Next, we establish the asymptotic distribution 
of the penalized mean estimator j3. Let (3 S = (/3 10 , . . . , P p 'o) T be the subset of nonzero 
mean parameter, Si = diag{pj' An |(/3i ), . . . ,Pa„(|/Vo|)}, and fo i = (PA n (/ 3 io) si g n (/ 3 io) ; • • • , 
p' Ari sign(/3j/o)) T . Let V ss denote the submatrix of V((3 ,a ) corresponding to the index 
subset s. 

Theorem 2. Given the SCAD penalty function p\{9) , if X n — > and y/n\ n — > oo, as n — > 
oo, then the sub-vector of the root-n consistent estimator f3 s has the following asymptotic 
distribution: 

V^(H SS + Si){/3 S - f3 s0 + (H ss + Sx)- 1 ^} -> N{0, V ss }, asn -> oo. 
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Now we investigate the properties of the penalized estimator for the association param- 
eter. Let J {ft, a) = Ua(/3, a) — np' x (\a\) © sign(a), where Jj(/3, a) denotes its jth element, 
j = 1, . . .,q. Let J v denote the set of penalized equations corresponding to indices set 
v, and U av denote the subset of U a corresponding to indices set v. Denote the inverse of 
H(fio, a ) as 

As H Pa = 0, it can be shown that H pa = 0, H pp = and H aa = H~^. Let H ss , H sv , 
H vs , and H m represent the submatrices of H(f3 ,a ), let H ss , H sv , H vs , H vv represent 
the submatrices of H((3 , cto) -1 , and let V ss , V sv , V vs , and V vv represent the submatrices 
of V(f3 , a ) corresponding to the subset of indices s and v. 

Theorem 3. Given (3 as the solution to the first set of penalized equation G(/3,&a) with 
the SCAD penalty function, if X n — > 0, and y/n\ n — > oo as n — > oo, then there exists 
a solution a such that J0,a) = 0, and \\a — ao\\ = O p (n^). Furthermore, we have 
lim^oo P(&j = 0) = 1, for all j £ v. 

Next, we need to establish the asymptotic joint distribution of the penalized esti- 
mator for the mean and the association parameters. Denote by a v = (ccio, • • • , £Vo) T 
the subset of nonzero association parameters, £ 2 = diag-fp^ (|aio|), • • • , P\ n (\ a q'o\)} , an d 
fc 2 = (p' An (/3io)sign(ai ), . . . , p' An sign(cv )) r 

Theorem 4. Given the SCAD penalty function, if X n — > and \/n\ n — > oo, as n — > oo, 
then the sub-vectors of the root-n consistent estimators /3 S and a v have the following joint 
asymptotic distribution: 



y/n 
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where 

B = 

y Hvs H vv + S2 1 
5. Implementation: penalized re- weighted least square lgorithm 

For notational convenience, Let C = (Cf , . . . , C^) T , A = (Aj, . . . , A^) T , and B be 
block diagonal with £>, as the diagonal elements. Let T = (Tf, . . . , ) T , i? = (Rj, . . . , 
Rn) T , and S be block diagonal with Si as the diagonal elements. Let u = C T B~ l A, 
D = C T B- X C, u* = T T S^R, and D* = T T S- X T. The initial estimates are 0® = fi A , 
and = a a- We employ the following two steps for selection and estimation purposes. 

• Stepl : Selection and estimation for the mean model using penalized estimating 
equation (j2J). 

— Outer loop: Based on the current estimate f3^\ we compute the updated esti- 
mate /3 (m) . We iterate the update, t = 1,2, 3, . . . , until \ \^-^ t+1) \ \ < e, the 
prespecified tolerance level. Using modified Fisher scoring method, we obtain 

n 

^+1) = p( t) _ {Y^c^) T BS {t \&A)- l cs (i) )r l 

n ^ V ( 4 ) 

{^C^^^^C^^a^-^C^^ + n^^JI^Dsign^)}. 

i=l Z=l 

Update the modified dependent variable Z0®) = C T 0^)/3® - A0®). Then 
the (3 lyt+l ^ solves the penalized weighted linear regression of responses Z on 
design matrix C with weight matrix B" 1 . That implies f3 lyt+l ^ minimizes the 
following objective function: 

\{Z0®) - C{fi®) T pfB{p®)- x {Z0®) - C0^) T (3) + nJ2p x (\ft\). 

1=1 

— Inner loop: By the coordinate descent method (Friedman et al, 2007) and the 
method of one-step local linear approximation (Zou and Li, 2008), we obtain 
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the update of the estimate sequentially for coordinates I = 1, . . . ,p: 



01 



D u 

where S(z, A) = sign(z)(|^| — A) + is the soft-thresholding operator. 

Throughout Step 1, a is fixed at the initial value a a- At the tth update of the 
outer loop, we calculate (3 t based on the updated Z, C and B. Nested within the 
outer loop, we cycle through the coordinates of flf' , I = 1 . . . ,p, 1 . . . ,p, . . . until 
convergence. Based on this new /3*, we update Z, C and B and proceed to the 
(t + l)th update of the outer loop. 

Step 2: Selection and estimation for the association model using penalized estimat- 
ing equation (J3]). 

— Outer loop: Based on the current estimate ay\ we compute the updated es- 
timate We update the offsets term in Equation (TjQ) using a® and (3. 
Then we perform the penalized offset logistic regression of on y^. with a 
total of Ylm ^2 observations. We iterate the update, t = 1,2,3,..., until 
\\a^' — < e, the prespecified tolerance level. 

— Inner loop: We update the penalized estimator a coordinate-wise for m = 
1, . . . , q, 1, . . . , q, . . . until convergence: 

(t+l) _ ~ Em'/m^mm'"L \ n P'\{\ a ™\)) + 



mm 



Throughout Step 2, /3 is fixed at the final value ft obtained from Step 1. 



6. Numerical studies 



We conduct simulation studies to assess the performance of the proposed methods under 
various circumstances. Binary response vectors yi = (yn,yi2, ■ ■ ■ ,yim) T are generated 
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from the joint probability function 

Hi 

fivn^ . . . , ifoj = n/'v n - /'o) 1 + x j<kPijk yij ~J: ijyik z^ ik }, 

where ^ = E(Y ij \X u Z { ), = /^(l - /%), fi ijk = P(Y i:j = 1, Y ik = l\X h Zi) and p ijk is 
the correlation coefficient of and Y^ given by = — PijPi k ) / ^VijV ik . The mean 
is modelled as 

logit/iy = P + ^ Px&ijl + ^ PzlZijV- 

1=1 l'=l 

The regression coefficients are set as (3 = —1.6, (3 X = (3.0, 0, 0, 1.5, 0) T , and (3 Z = 
(0,0, -1.5,0, 0) T . Covariates generated from a di-multivariate normal distribu- 

tion N(jj, x , E x ), and covariates are independently generated from a c? 2 -multivariate 
normal distribution N(p z , S z ). We set d\ = d 2 = 5, = 0.5 x /i^ = —0.2 x l d2 , E x 
as a di x di matrix with element being a x ij = <r x p x , S 2 as a d 2 x g?2 matrix with 
element being (7 a jj = c^ol , (7^ = cr^ = 1, and p x = p z = 0.5. 
The odds ratio <pij k between Yij and Y ik is modelled as 

d$ d4 

1=1 l'=l 

The regression coefficients are set as ao = 0.693, a w = (0.3, —0.3, 0, 0, 0) T , and a v = 
(0, 0, 0, 0, 0) T . Covariates Wij k i are generated from a c^-multivariate normal distribution 
N(p w , and covariates ify-jy are independently generated from a ^-multivariate nor- 
mal distribution N(p v , £„). We set d 3 = <i 4 = 5, = 0.5 x l d3 , p v = —0.2 x l d4 , T, w as a 
c?3 x <i 3 matrix with element being a W ij = a^px , S 2 asa^x matrix with 
element being a vi j = a^pl , cr^ = <7„ = 1, and p w = p v = 0.5. 

The cluster size rij is set as 5. We carry out three analyses of variable selection for 
three different scenarios: (1). We select the significant mean parameters with no se- 
lection conducted for the association parameters; (2). We focus on variable selection 
for the association parameters with no selection conducted for the mean parameters; 
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(3). We conduct variable selection for both the mean and the association parame- 
ters. In Analysis 1, we only perform the first stage penalized estimation on the mean 
parameters with the second stage being omitted. To select the optimum tuning pa- 
rameter, we used modified BIC information criterion, which takes the form: BIC = 
{E* U ip 0, «a)} t {E, Uif,0, & A )U if) 0, «a) t }~ 1 {E j Uif,0, a A )} + log n{£f =1 I0 t + 0)}. 
In Analysis 2, we perform the second stage penalized estimation on the association pa- 
rameters with the first stage having zero penalty and hence the mean parameter esti- 
mate equal to the unpenalized ALR estimate. The BIC takes the following form: BIC = 

{E, U ia A , &)} T {£, U ia 0A, &)Ui<*0A, A) T }" 1 {E l U ia 0A, a)}+logn{E^ =1 + 0)}. 
In Analysis 3, the corresponding BIC takes the following form: 

BIC ={J2 U if) 0, &a)} T {J2 & A)Uip0, «a) T }" 1 {^ 

i i i 

+ {J2 U ia 0, a)} T {J2 U ia 0, a)U ia 0, af}' 1 ^ U ia 0, &)} (5) 

i i i 

+ \ogn{J2 10i + J ( d - ± °)>- 

1=1 m=l 

The final model is chosen with the smallest BIC value. For Analysis 1, sample sizes with 
n = 200, 500, and 1000 are considered, while for Analysis 2 and 3, we choose sample sizes 
with n = 500, 1000 and 2000 are considered. One hundred data sets are simulated for 
each parameter setting. 

Table 1 summarizes the performance of the variable selection procedure for the mean 
parameters, i.e., the results for Analysis 1. There are 11 mean parameters, 4 of which 
are nonzero and 7 of which are set to zero. The procedure is implemented with both 
LASSO and SCAD penalties for comparison. The optimum tuning parameter A is chosen 
through the BIC criterion. Based on the output of the penalized estimates, average 
positive selections (PS) and average false discoveries (FD) are calculated. The penalized 
estimator using the SCAD penalty together with the BIC information criterion is able to 
achieve average PS = 4, the true number of nonzero coefficients. It also achieves small 
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average FD = 0.36 for n = 500 and average FD = 0.02 for n = 1000. In comparison, 
the penalized estimator using the LASSO penalty with the BIC information criterion is 
also able to achieve average PS = 4. However, it consistently has higher average false 
discoveries: average FD = 2.885 for n = 500 and average FD = 2.719 for n = 1000. 

Table 2 summarizes the performance of the variable selection procedure for the asso- 
ciation parameters, i.e., the results for Analysis 2. There are 11 association parameters, 
3 of which are nonzero and 8 of which are set to zero. The penalized estimator using 
the SCAD penalty together with the BIC information criterion is able to achieve positive 
selections close to the true number of nonzero coefficients: PS = 2.4 for n = 500 and 
PS = 2.98 for n = 2000 . It has average FD = 3.27 for n = 500 and average FD = 2.91 
for n = 2000. This result shows that not surprisingly, compared to the selection of mean 
parameters, variable selection of association parameters has higher number of false dis- 
coveries. The penalized estimator using the LASSO penalty with the BIC information 
criterion is also able to achieve similar positive selections: PS = 2.53 for n = 500 and 
PS = 2.93 for n = 2000 . However, compared to the SCAD penalty, the estimator with 
the LASSO penalty consistently has higher average false discoveries: average FD = 4.65 
for n = 500 and average FD = 4.67 for n = 2000. 

Table 3 summarizes the performance of the variable selection procedure for the joint 
selection of mean and association parameters, i.e., the results for Analysis 3. There are 22 
parameters in total, 7 of which are nonzero and 15 of which are set to zero. The penalized 
estimator with the SCAD penalty maintains its satisfactory performance. Its average PS 
is always close to the true number of nonzero coefficients: PS = 6.77 for n = 500 and 
PS = 6.99 for n = 2000. It has average FD = 7.13 for n = 500 and average FD = 3.39 
for n = 2000. Among the false discoveries, most are attributed from falsely identified 
association parameters. Its FD improves when sample sizes increases. 

In summary, the penalized estimator based on the SCAD penalty tends to outperform 
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the penalized estimator based on the LASSO penalty. The selection procedures for the 
mean model perform better than those for the association model. Both high sensitivity 
and selectivity are achieved for the selection of mean parameters, while for the selection 
of association parameters, only sensitivity is high but selectivity is moderate. This phe- 
nomenon is consistent with the estimation results. For example, based on one random 
data set with n = 2000 observations generated from the setting above , the 8 unpenal- 
ized estimates for the zero association parameters have values range from 0.006 to 0.27, 
and assciated standard errors ranging from 0.09 to 0.11; in contrast, the 7 unpenalized 
estimates for zero mean parameters have values range from 0.001 to 0.04 and associated 
standard errors range from 0.04 to 0.07. 

It is also noted that when the modified BIC is used with the LASSO estimator, the 
variable selection exhibits high false discovery rates. This is because the LASSO penalty 
increases linearly with the size of the estimator, which leads to large biases for large 
nonzero parameters. This forces the procedure to select smaller tuning parameter to 
make the first term in BIC criterion small. By comparing the average optimum tuning 
parameters in Tables 1, 2 and 3, we can see the the modified BIC consistently selects 
smaller tuning parameters and hence has higher false discoveries for the LASSO estimator 
than the SCAD estimator. 

7. Data analysis 

We apply the proposed method to analyze the data arising from a smoking cessation study 
( Gruder et al., 1993, Hedeker & Gibbons, 2006). The data contain repeated measure- 
ments at four different time points for 489 individuals. The outcome is the dichotomous 
measurement representing whether or not an individual has quit smoking. Data were col- 
lected at four telephone interviews: post-intervention, and 6, 12, and 24 months later. The 
subjects are divided into four groups by the treatments they received: (1). randomized 
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to the control condition; (2). randomized to receive a group treatment, but never showed 
up to the group meetings; (3). randomized to and received group meetings; and (4). 
randomized to and received enhanced group meetings. In the analysis, these four groups 
were compared using Helmert contrasts. The three treatment contrasts were denoted as 
HI, H2, and H3. Additional covariates that may influence the probability of smoking 
abstinence include time (T), a race indicator (race), an indicator of TV intervention (tv) 
and an indicator of manual intervention (manual). 

We consider a mean model with the main covariates effects and several interaction 
terms included: 

logityUjj = fio + 0\Tj + f3 2 Tj + f3 3 Hli + ^H2i + f3 5 H3i + f3 6 racei + f3 7 tVi + (3 8 manuali 
+ (3 g Tj x Hli + p l0 Tj x H2i + ^ x H3 t . 

(6) 

For the association structure, we examine the empirical correlation matrix among the four 
repeated measurements, and it shows that the correlation becomes weaker as the time 
interval gets larger. Therefore, for the association structure, we consider the model 

log 4>ijj> = a + ai\Tj — Tji \ + a 2 \Tj — Tj>\ 2 . 

We perform penalized estimation and variable selection on both mean and associa- 
tion parameters simultaneously. Using the modified BIC, we obtain the optimum tuning 
parameter A = 0.09. Table 4 summarizes the result of the penalized estimators obtained 
from the proposed two-stage penalized estimating equations. Through variable selection, 
7 out of the 12 mean parameters are estimated to be nonzero, and 2 of the 3 association 
parameters are estimated to be nonzero. Both the linear and quadratic time parameters 
are significant, which implies the overall change in smoking abstinence involves linear and 
quadratic time effect. As the estimate for the linear effect is negative and the estimate 
for the quadratic effect is positive, a decelerating negative trend is suggested. Among the 
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three treatment contrasts, H2 is estimated to have zero effect. This implies that whether 
or not showing up to the group meetings is not associated with subsequent cessation. 
HI is found to be significant, which implies whether randomization to group versus to 
control had effect on subsequent cessation. H3 is also found to be significant, indicating 
that the type of meetings influenced the outcome. The race indicator is not found to be 
significant, indicating that there is no evidence of suggesting different effects between the 
white population and other ethnic groups. The TV intervention and manual instructions 
are found to significantly increase the probability of smoking abstinence. All of the inter- 
action effects between time and treatments are estimated to be zero. For the association 
structure, it seems that the log odds ratio does decrease linearly with the time difference. 
The quadratic term of the time difference is found to be insignificant in modelling the 
log odds ratio. Compared to the unpenalized estimates by the ALR method, the two 
methods seem in good agreement. The parameters that are thresholded to zero by our 
penalized method are those with large p-values generated by ALR method. As the tuning 
parameter is selected by the data driven information criterion, our method achieves the 
variable selection without employing any preset significance level in contrast to traditional 
variable selection methods. 

8. Conclusion 

We present a hierarchical two-stage procedure to perform simultaneous selection and es- 
timation on generalized estimating equations for which both mean and association struc- 
tures are modelled. The asymptotic behavior of the penalized estimates has been es- 
tablished. The numerical results and data analysis illustrate the practical utility of the 
proposed method and demonstrate satisfactory performance. The proposed method can 
be modified to deal with data with more complex association structures. For example, 
clustered data frequently arise in longitudinal studies. Common examples include school- 
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based longitudinal studies and community-based longitudinal survey. For such data, asso- 
ciation structures typically involve three types of correlation: the clustering effect among 
subjects within cluster at a given time point, serial correlation of replicate measurements 
within subjects, and mixed effects of different subjects within the same cluster at different 
time points. We may adapt the discussion of Yi & Cook (2002) to develop a simultaneous 
selection and estimation method to handle such data. 
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Appendix a 

Assumptions of regularity conditions: 1) H((3 , a ) and V((3 , a ) are both finite and 
positive definite. 2) For any e > 0, there exist r\ and S such that for any \/3 — /3q\ < r], and 
\a — a \ < 5, we have 

\H(Po,a )\(l-e) < \H(f3,a)\ < \H(J3o, a )\(l + e), 

and 

\V(p ,a )\(l-e) < \V(f3,a)\ < \V(J3 , a )\ (1 + e), 

and the inequalities hold true componentwise. 3) For n — 1, the covariance matrices 

Var^, (U P p(P,P)), Var^ 0)ao (Upp(P,a)), Var^,^ (Upp(a,P)), Vara 0iQ!0 (Upp(a,a)) are 
finite and positive definite. 

Appendix b : proof of the theorems 

Proof to Theorem Q] 
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Proof. Consider a ball \ \/3 — /3o|| < Mn 2 for some finite M. Applying Taylor Expansion, 
we obtain: 

Gj{l3,a A ) = U^(f3,a A ) - np' Xn (\f3 j \)sign(f3 j ) 

(7) 

= Up.(J3 ,a A ) + ]T(A - MdU fij {P*,a A )m - np' K ( |/%|) sign (ft), 
z=i 

for j = 1, . . . ,p, and some 0* between (3 and /?o- As E{Up j (/3 , = 0, Up A fa, a A ) = 
O p (n^). As — /3| < Mra"5 and — «o = O p (n~5) ; Upp(/3*,a A ) = O p (n) component- 
wise. First we consider j e s c . Because lim inf^oolim mie^ 0+ p' Xn (f3) / \ n > 0, and A n — > 0, 
and \/n\ n — >■ 00 as n — > 00, the third term dominates the the first two terms. Thus the 
sign of Gj((3, a A ) is completely determined by the sign of f3j. This entails that inside this 
Mn' 1 / 2 neighborhood of /3 , Gj(/3, a A ) > 0, when /3j < and Gj(j3, a A ) < 0, when (5j > 0. 
Therefore for any f3 inside this ball, if f3j = 0, it solves the equation Gj((3,a A ) = with 
probability tending to one. This entails for any finite constant M, and any sequence of 
ft, such that /3j = 0, and \\/3 — (3q\ \ < Mn~^, we have lim^oo P(Gj0, a A ) = 0) = 1, for 
all j G s c . 

Next we consider j 6 s. For n large enough and (3j 7^ 0, p' A (|/^ |) = and p' x {\ (3 [0 \) = 0. 
For notational convenience, denote Wpp = —Upp(f3*,a A ), H = H(f3 ,a ), and H* = 
H((3*, a A ). Let (3 S denote the sub- vector of (3, G s denote the subset of penalized equations 
in G, Up s denote the subset of equations in Up and W* s denote the sub-matrix of Wip, 
H ss denote the sub-matrix of H, and H* s denote the sub-matrix of H* corresponding to 
the subset of indices s. This leads to the formulation: 

G,{p, a A ) = Up s (Po, a A ) - W* ss {Ps - &o). (8) 

Because of weak law of large numbers, W* s = n(H ss + H* s — H ss + o p (l)). Let l s de- 
note a vector of ones with length equal to the cardinality of s. For a given e, choose 
n sufficiently large, so that \(H* S — H SS )H~ 1 1 S \ < el s componentwise. Choose (3 S — 
(3 s0 = H~ s l l s Cn^ so that \\(3 S - (3 s0 \\ < MnT*. This entails W* ss ((3 n s - (3 s0 ) = n(H ss + 
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H* s - H ss + o p {l))H~}l s Cn~^ = Cn^(l s (l + e) + o p (l)). By choosing M large enough, 
hence C large enough, we have the second term of equation (jHJ) dominating the first 
term. Therefore, if (3 S — (3 s0 = H~^l s Cn~^ , G s (/3,&a) < 0, and similarly if (3 S — (3 s0 = 
—H^lsCn - ^, G s (/3,&a) > 0. Because G s ((3,ctA) is continuous on this compact set 
f3 = {((3 s o + H~^l s dn~^ , p _ p /) T ; — C < d < C}, where p _ p / denotes a vector of zeros 
of length p — p', and p 1 is the cardinality of s. Therefore, there exists a (3 that lies in this 
compact set and G s (j3, &a) = 0. 

□ 

Proof to Theorem [2] 

Proof. Based on Taylor expansion presented in Proof to Theorem 1, we have 

= G.0, & A ) = UpM, & A ) - W* SS S - sO ) - nb, - nL\0 a - & ), (9) 

where T/[ = diag{p" n (|/3]" |), . . . ,/^ n (|/3* |)}, and (3* lies between (3 and (3 . This entails 

(W; 8 + nEt)- 1 ^ - (3 s0 ) = U Ps (p , & A ) - nh. 

As (3 — > 0o in probability, -W* s — > H ss in probability and X* — >■ Si in probability. It can 
also be shown that 

1 TT (R * \ 1 TT (R \A 1 f ,dUp s (Po,(X*) 

UpM,a A ) = -i=Up s {Po,ao) + —^{a A - a )- 



n \ n \ n oa 



As \/n(aA — chq) is bounded in probability and \~^ M ^~~^ E dUps ^°' a ^ = in prob- 
ability, the limiting distribution of ^Up a (f3o,&A) is N{0,V ss }. According to Slutsky's 
theorem, we have^(H ss + E x ){& - (3 s0 + (H ss + E 1 )~ 1 b 1 } -»• iV{0, V afl )}. □ 

Proof to Theorem [3] 

Proof. Consider a ball 1 1 or — qjq 1 1 < Mn~? for some finite M. Applying Taylor Expansion, 
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we obtain: 

Jj0,<x) = U aj 0,a) - np' An (|a j |)sign(a i ) 

p 

= U aj (A), oo) + J20 t - 0m)dU aj (/T, a*)/d(3 t (1Q) 
+ ^(«m - o: m o)9t/ CKj .(/?*,a*)/9a m - np' An (|a i |)sign(a i ), 

m=l 

for some /3* between (3 and /?o and some a* between a and cto- As E(U a .((3o,ao)) = 0, 
U aj {f3o, «o) = O p (ri2). As |/3 - /3| = O p (n~^) and a - a < Mn~^, U a p(/3*,a*) = O p (n) 
and U aa ((3*, a*) = O p (n) componentwise. First consider j ^ v. Note that the first three 
terms are all of order O p {n 1 ^ 2 ). As lim inf^^lim inf a ^o+PA„ ( a ) / > 0, and \ n — > 0, 
and n^X n — > oo as n — >■ oo, the four term dominates the the first three terms. Thus the 
sign of Jj(j3,a) is completely determined by the sign of aj. This entails that inside this 
MrC 1 ! 2 neighborhood of a , Jj((3,a) > 0, when ctj < and Jj(j3,a) < 0, when aj > 0. 
Therefore for any a inside this ball, if aj = 0, it solves the equation Jj((3,a) = with 
probability tending to one. This entails for any finite constant M, and any sequence of 
a, if oij = 0, and \\a — ao\\ < Mn"^, we have lim^oo P(Jj0, a) = 0) = 1, for all j v. 

Next we consider j G v. It is known that for n large enough, and <x, ^ 0, p' x (\(^jo\) = 
and P\(\ajo\) = 0. Let W* a = —U a0l (/3*,a*), and W* v denote the sub-matrix of W* a . Let 
W*^ = —U a /3(/3*, a*), and W* s denote the sub-matrix of W*p corresponding to indices set 
v and s. Then Equation ffTOl) can be expressed as 

J v 0,a) = U av (p ,a ) - W* vs 0s - fro) - W* vv {a v - a*,). (11) 

From the proof to Theorem 2, we have (f3 s — (3 s0 ) = H~}Up s (fioi a o) + °p(l)- Furthermore, 
we have W* s —> H vs in probability and W* v — > H vv in probability. Therefore, Equation 
flTTJ can be simplified as 

J v 0,a) = U av (/3 ,a ) - H vs H~^Up s {^ a o) ~ H vv (a v - a v0 ) + o p (l). (12) 
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We choose a v - a v0 = H vs Cm^ + H vv C 2 n^. Because H^H' 1 + H VV H VS = 0, the left side 
of the equation can be expressed as 

-H va H^{Up a {Po,oto) - Cin*) + (U av (#,, a ) - C 2 n*). 

Because both ^ s (/3 ,«o) and U av (j3o,ao) are of order O p (n%), we can choose M large 
enough, hence C\ and C 2 large enough so that both Up s ({3 ,a ) and U av ((3 ,a ) are 
dominated by C\n^, C 2 n^ . Furthermore, J v 0,a) is continuous on this compact set 
{(H vs pCin^ + H vv pC 2 n^, q - q >) T ; — 1 < p < 1}, where q - q > denotes a vector of zeros 
of length q — q', and q' is the cardinality of v. The sign of J v (/3,a) is opposite when 
p = —1, and 1. Therefore, there exist a a that lies in this compact set and J v ($, a) = 0. 

□ 

Proof to Theorem [4] 

Proof. Based on the proofs to Theorem 1, 2 and 3, Taylor expansions of the penalized 
estimating equations at the penalized estimators can be expressed as: 

(Ps ~ Pso)(W* s + nEx) + (a A - a )Wp a = Up s (f3 , a ) - ra&i, 

and 

0s - &o)OC) + (a s - a s0 )(W: v + n£ 2 ) = U av (/3 , a ) - nb 2 . 

Because \W* S -> H ss in probability, \W% a -> Hp a = in probability, \W* V ->■ in 
probability, we have 

(4 - + SO = Up.(p ,a )/n - h + o p (l), 

and 

(4 - f3 s0 )(H vs ) + (d„ - a„o)(#mj + S 2 ) = U av (/3 ,a Q )/n -b 2 + o p (l). 
This implies 
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According to Slutsky's theorem and central limit theorem, the joint distribution of j3 s and 
a v converges in distribution to the joint multivariate normal distribution. □ 

A PPENDIX C: VARIANCE ESTIMATE 

Here we outline how to evaluate the standard errors of the penalized estimators. The 
consistent estimate for the negative Hessian matrix H is denoted as 



H 



where Fi = dQ/dfi. The consistent estimator of the variance matrix V of the score vectors 
is denoted as 



V 



with U i/3 0, a) = Q0, a) T B0, a)r l Ai0, a), and U ia 0, a) = T t 0, a) T S0, a)J 1 R i 0, a). 
We also define 

A = diag^dAD/i/H . . . ,p!(|/VI)/I/VI}> 

and 

E 2 = diag{p' A (|di|)/|di|, . . . ,p' x (\a p >\)/\a q >\}. 



Then estimated covariance matrix for y/n 
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-Pa 
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Vss 


Vsv 
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Table 1: Positive selections (PS) and false discoveries (FD) for variable selection of mean 
parameters with 4 nonzero coefficients and 7 zero coefficients 







LASSO 






SCAD 




n 


A 


PS 


FD 


A 


PS 


FD 


200 


0.019 


1 


3.690 


0.064 


■1 


1.620 




(0.011) 


(0) 


(1.830) 


(0.012) 


(0) 


(1.135) 


500 


0.015 


■1 


2.885 


0.058 


■1 


0.360 




(0.006) 


(0) 


(1.486) 


(0.015) 


(0) 


(0.578) 


1000 


0.011 


■1 


2.719 


0.053 


■1 


0.020 




(0.005) 


(0) 


(1.412) 


(0.013) 


(0) 


(0.140) 



(PS denotes the number of correctly identified nonzero coefficients ;FD denotes the number of zero coefficients incorrectly 
estimated to be nonzcro;numbers without parenthesis are average values; numbers with parenthesis are standard dcviations;A 
denotes the average optimum tuning parameter.) 



Table 2: Positive selections (PS) and false discoveries (FD) for variable selection of 
association parameters with 3 nonzero coefficients and 8 zero coefficients 







LASSO 






SCAD 




11 


A 


PS 


FD 


A 


PS 


FD 


500 


0.027 


2.530 


4.650 


0.074 


2.400 


3.270 




(0.020) 


(0.688) 


(1.977) 


(0.032) 


(0.752) 


(2.004) 


1000 


0.022 


2.730 


4.480 


0.054 


2.720 


3.120 




(0.015) 


(0.510) 


(2.254) 


(0.021) 


(0.570) 


(2.076) 


2000 


0.014 


2.930 


4.670 


0.040 


2.980 


2.910 




(0.009) 


(0.256) 


(2.040) 


(0.012) 


(0.141) 


(1.730) 



( PS denotes the number of correctly identified nonzero coefficicnts;FD denotes the number of zero coefficients incorrectly 
estimated to be nonzcro;numbers without parenthesis are average values; numbers with parenthesis are standard dcviations;A 
denotes the average optimum tuning parameter.) 
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Table 3: Positive selections (PS) and false discoveries (FD) for variable selection of both 
mean and association parameters with 7 nonzero coefficients and 15 zero coefficients 







LASSO 






SCAD 




n 


A 


PS 


FD 


A 


PS 


FD 


500 


0.015 


6.840 


9.430 


0.042 


6.770 


7.130 




(0.007) 


(0.368) 


(2.508) 


(0.005) 


(0.423) 


(1.662) 


1000 


0.012 


6.940 


9.380 


0.041 


6.880 


5.430 




(0.005) 


(0.278) 


(2.420) 


(0.006) 


(0.356) 


(1.486) 


2000 


0.009 


7.000 


9.380 


0.040 


6.990 


3.390 




(0.004) 


(0) 


(2.490) 


(0.006) 


(0.100) 


(1.421) 



( PS denotes the number of correctly identified nonzero cocfficicnts;FD denotes the number of zero coefficients incorrectly 
estimated to be nonzero;numbers without parenthesis are average values; numbers with parenthesis are standard dcviations;A 
denotes the average optimum tuning parameter.) 



Table 4: Penalized estimation of smoking cessation study data set 





ALR 




PGEE2 






variable 


estimate 


SE 


estimate 


SE 


intercept 


-1.280 


0.140 


1.229 





.123 


time 


-0.850 


0.145 


-0.778 





.142 


time 2 


0.238 


0.044 


0.210 





,043 


hermertl 


0.563 


0.211 


0.321 





.175 


hermert2 


0.225 


0.149 










hermert3 


0.324 


0.140 


0.255 





.121 


racew 


0.295 


0.210 










tv 


0.512 


0.201 


0.545 





.199 


manual 


0.516 


0.203 


0.505 





192 


timeXhl 


-0.152 


0.077 










timeXh2 


-0.080 


0.060 










timeXh3 


-0.049 


0.066 










intercept 


3.582 


0.442 


3.103 





.163 


timcdiff 


-1.068 


0.525 


-0.477 





.078 


timediff 2 


0.155 


0.141 











(The first part of the table includes mean parameters and the second part of the table includes the association parameters.) 
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